Conceptual Indexing Based on Document Content Representation
نویسندگان
چکیده
This paper addresses an important problem related to the use of semantics in IR. It concerns the representation of document semantics and its proper use in retrieval. The approach we propose aims at representing the content of the document by the best semantic network called document semantic core in two main steps. During the first step concepts (words and phrases) are extracted from a document, driven by an external general-purpose ontology, namely WordNet. The second step a global disambiguation of the extracted concepts regarding to the document leads to build the best semantic network. Thus, the selected concepts represent the nodes of the semantic network whereas similarity measure values between connected nodes weight the links. The resulting scored concepts are used for the document conceptual indexing in Information Retrieval.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملPhotograph Indexing and Retrieval Using Star-graphs
We present in this paper a relational approach for indexing and retrieving photographs from a collection. Instead of using simple keywords as an indexing language, we propose to use star-graphs as document descriptors. A star-graph is a conceptual graph that contains a single relation, with some concepts linked to it. They are elementary pieces of information describing combinations of concepts...
متن کاملتأملاتی بر نمایه سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه
Purpose: This paper presents various image indexing techniques and discusses their advantages and limitations. Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...
متن کاملAggregation-Based Structured Text Retrieval
DEFINITION Text retrieval is concerned with the retrieval of documents in response to user queries. This is achieved by (i) representing documents and queries with indexing features that provide a characterisation of their information content, and (ii) defining a function that uses these representations to perform retrieval. Structured text retrieval introduces a finer-grained retrieval paradig...
متن کاملA Conceptual Annotation Approach to Indexing in a Web-Based Information System
All the specialists have agreed that the possibility of adding to multimedia WWW objects some sort of ‘conceptual’ annotations describing their information content would greatly contribute to solve the problem of their ‘intelligent’ indexing and retrieval. We propose to associate with the Web objects not the final conceptual annotation, but a simple natural language (NL) caption, in the form of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005